Skip to content

<fix>[storage]: symmetric snapshot group disband on chain delete#4043

Open
ZStack-Robot wants to merge 2 commits into
zsv_5.1.0from
sync/tao.gan/fix/ZSV-10538-group-disband-integrity
Open

<fix>[storage]: symmetric snapshot group disband on chain delete#4043
ZStack-Robot wants to merge 2 commits into
zsv_5.1.0from
sync/tao.gan/fix/ZSV-10538-group-disband-integrity

Conversation

@ZStack-Robot
Copy link
Copy Markdown
Collaborator

ungroupAfterDeleted now mirrors ungroupAfterDeleteSingleSnapshot:
regardless of root/data volume type, a snapshot group VO is only
disbanded after ALL of its refs reach snapshotDeleted=true.

Previously the root-volume path immediately removed the group VO once
the root chain finished, leaving data-volume refs as orphans pointing
to a non-existent group. After this change the group lives until every
ref is marked deleted, which is the symmetric behavior with the
single-snapshot path and the precondition the integrity check (landed
in a separate commit) relies on.

Resolves: ZSV-10538

Change-Id: I77707a78706271697463676a756c617a796f6c61

sync from gitlab !9939

taogan21 added 2 commits May 22, 2026 10:27
ungroupAfterDeleted now mirrors ungroupAfterDeleteSingleSnapshot:
regardless of root/data volume type, a snapshot group VO is only
disbanded after ALL of its refs reach snapshotDeleted=true.

Previously the root-volume path immediately removed the group VO once
the root chain finished, leaving data-volume refs as orphans pointing
to a non-existent group. After this change the group lives until every
ref is marked deleted, which is the symmetric behavior with the
single-snapshot path and the precondition the integrity check (landed
in a separate commit) relies on.

Resolves: ZSV-10538

Change-Id: I77707a78706271697463676a756c617a796f6c61
Once symmetric disband is in place, an incomplete snapshot group (some
refs deleted, some still alive on the VM) is sticky and must not be
silently ignored by subsequent operations on the same VM.

Block on the following entries when the VM has any incomplete group
(the incomplete group itself is excluded from the check, so users
always have a path to clear the debt):

  - APIDeleteVolumeSnapshotGroupMsg (other groups): blocked unless
    a new boolean force=true is passed. force is added to the API
    as the operator bypass; other entries deliberately do NOT expose
    force.
  - APICreateVolumeSnapshotGroupMsg: blocked.
  - APIAttachDataVolumeToVmMsg / APIDetachDataVolumeFromVmMsg: blocked.

APIDestroyVmInstanceMsg is intentionally NOT blocked: the new
VolumeSnapshotGroupCascadeExtension keys on VmInstanceVO and
force-cleans every group VO (refs -> archived metadata -> host
backup files -> group VO) when the VM is destroyed, so VM teardown
is never wedged by leftover incomplete groups.

Single-snapshot APIs are also exempt by design.

Resolves: ZSV-10538

Change-Id: I717363667067726a796467787568756d6f7a706e
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

Review Change Stack

Walkthrough

PR 在卷快照组删除流程中增加了强制删除参数和完整性保护。APIDeleteVolumeSnapshotGroupMsg 定义了 force 参数,VolumeSnapshotGroupChecker 提供快照组完整性检查,删除前置校验和 API 拦截阻止不安全删除,级联扩展处理 VM 删除时的清理,快照删除后重写清理逻辑实现对称处理。

Changes

卷快照组删除保护与清理

层 / 文件 概述
API 消息中的强制删除参数
header/src/main/java/org/zstack/header/storage/snapshot/group/APIDeleteVolumeSnapshotGroupMsg.java
在 APIDeleteVolumeSnapshotGroupMsg 中新增 force 布尔参数(默认 false),作为删除操作的可选输入控制。
快照组完整性检查基础设施
storage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupChecker.java
在 VolumeSnapshotGroupChecker 中实现 findIncompleteGroupsOnVm 方法,用于查询 VM 上未完成(仍有存活引用的已删除快照)的快照组列表。
删除前置校验与 API 拦截
storage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupBase.java, storage/src/main/java/org/zstack/storage/volume/VolumeApiInterceptor.java
在 VolumeSnapshotGroupBase.handleDelete 和 VolumeApiInterceptor 中对非强制删除场景增加前置校验,当 VM 上存在未完成快照组时阻止后续操作。
VM 删除时的级联快照组清理
storage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupCascadeExtension.java
新增 VolumeSnapshotGroupCascadeExtension 以处理 VM 删除时的快照组级联清理,按顺序删除引用记录、元数据归档、备份文件及快照组 VO。
快照删除后的对称组清理
storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotTreeBase.java
重写 VolumeSnapshotTreeBase.ungroupAfterDeleted,实现对称的快照组清理逻辑:标记引用已删除,统计组的剩余活跃引用,仅在无活跃引用时删除组相关数据。

🎯 3 (Moderate) | ⏱️ ~25 minutes

🐰 快照组删除需谨慎,
Force 参数添得好妙,
完整性检查护航,
级联清理不遗漏,
对称清理讲究巧!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed 标题准确概括了主要变更:使快照组解散逻辑对根卷和数据卷的处理对称化,这是PR的核心目的。
Description check ✅ Passed 描述清晰说明了变更内容:ungroupAfterDeleted现在在所有快照引用标记为删除后才解散组,与单快照删除路径行为对称。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/tao.gan/fix/ZSV-10538-group-disband-integrity

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
storage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupChecker.java (1)

46-68: ⚡ Quick win

建议把逐组两次 count() 的查询改为批量聚合,降低热点路径开销

当前实现对每个快照组都发起两次计数查询,在 VM 组数量较多时会放大 DB 往返次数。建议一次性拉取 refs 后聚合(或使用分组统计),保持语义不变并减少拦截链路延迟。

可参考的重构方向
-        List<String> incomplete = new ArrayList<>();
-        for (Object o : groupUuids) {
-            String guuid = o.toString();
-            if (guuid.equals(excludeGroupUuid)) {
-                continue;
-            }
-            long deletedRefs = Q.New(VolumeSnapshotGroupRefVO.class)
-                    .eq(VolumeSnapshotGroupRefVO_.volumeSnapshotGroupUuid, guuid)
-                    .eq(VolumeSnapshotGroupRefVO_.snapshotDeleted, true).count();
-            if (deletedRefs == 0) {
-                continue;
-            }
-            long totalRefs = Q.New(VolumeSnapshotGroupRefVO.class)
-                    .eq(VolumeSnapshotGroupRefVO_.volumeSnapshotGroupUuid, guuid).count();
-            if (deletedRefs < totalRefs) {
-                incomplete.add(guuid);
-            }
-        }
-        return incomplete;
+        List<Tuple> refs = Q.New(VolumeSnapshotGroupRefVO.class)
+                .in(VolumeSnapshotGroupRefVO_.volumeSnapshotGroupUuid, groupUuids)
+                .select(VolumeSnapshotGroupRefVO_.volumeSnapshotGroupUuid, VolumeSnapshotGroupRefVO_.snapshotDeleted)
+                .listTuple();
+
+        Map<String, long[]> stats = new HashMap<>();
+        for (Tuple t : refs) {
+            String guuid = t.get(0, String.class);
+            boolean deleted = t.get(1, Boolean.class);
+            long[] s = stats.computeIfAbsent(guuid, k -> new long[2]); // [deleted, total]
+            if (deleted) {
+                s[0]++;
+            }
+            s[1]++;
+        }
+
+        return stats.entrySet().stream()
+                .filter(e -> !Objects.equals(e.getKey(), excludeGroupUuid))
+                .filter(e -> e.getValue()[0] > 0 && e.getValue()[0] < e.getValue()[1])
+                .map(Map.Entry::getKey)
+                .collect(Collectors.toList());
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@storage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupChecker.java`
around lines 46 - 68, The loop in VolumeSnapshotGroupChecker that calls two
per-group Q.count() queries on VolumeSnapshotGroupRefVO (one for
snapshotDeleted=true and one for total) is causing DB hot spots; replace it with
a single batched/grouped query that fetches counts per volumeSnapshotGroupUuid
(e.g., SELECT volumeSnapshotGroupUuid, SUM(snapshotDeleted), COUNT(*) GROUP BY
volumeSnapshotGroupUuid) for all groupUuids at once, then iterate the returned
aggregated results to populate the existing incomplete list (keeping the
excludeGroupUuid check and semantics of comparing deletedRefs < totalRefs)
instead of issuing per-group queries in the for-loop over groupUuids.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotTreeBase.java`:
- Around line 2155-2175: The code that computes remaining refs for a group uses
VolumeSnapshotGroupRefVO.snapshotDeleted to decide whether a group can be
disbanded, but memory snapshot VOs deleted by cleanUpMemorySnapshots() are
removed without setting their VolumeSnapshotGroupRefVO.snapshotDeleted=true, so
remaining will remain >0 and groups never disband; fix by ensuring memory
snapshots removed in cleanUpMemorySnapshots() also mark their corresponding
VolumeSnapshotGroupRefVO.snapshotDeleted=true (or add their group ref UUIDs to
the same snapshots-to-delete set before the remaining-count check), i.e. update
cleanUpMemorySnapshots() or the deletion flow to reuse the group-ref clearing
logic used here so VolumeSnapshotGroupRefVO entries reflect the deletion and
allow groupsToDelete to be computed correctly.

---

Nitpick comments:
In
`@storage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupChecker.java`:
- Around line 46-68: The loop in VolumeSnapshotGroupChecker that calls two
per-group Q.count() queries on VolumeSnapshotGroupRefVO (one for
snapshotDeleted=true and one for total) is causing DB hot spots; replace it with
a single batched/grouped query that fetches counts per volumeSnapshotGroupUuid
(e.g., SELECT volumeSnapshotGroupUuid, SUM(snapshotDeleted), COUNT(*) GROUP BY
volumeSnapshotGroupUuid) for all groupUuids at once, then iterate the returned
aggregated results to populate the existing incomplete list (keeping the
excludeGroupUuid check and semantics of comparing deletedRefs < totalRefs)
instead of issuing per-group queries in the for-loop over groupUuids.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: fd11e4ef-283b-4cdb-90e4-a702f059e26d

📥 Commits

Reviewing files that changed from the base of the PR and between fe6e5e1 and 467e800.

⛔ Files ignored due to path filters (1)
  • conf/springConfigXml/volumeSnapshot.xml is excluded by !**/*.xml
📒 Files selected for processing (6)
  • header/src/main/java/org/zstack/header/storage/snapshot/group/APIDeleteVolumeSnapshotGroupMsg.java
  • storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotTreeBase.java
  • storage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupBase.java
  • storage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupCascadeExtension.java
  • storage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupChecker.java
  • storage/src/main/java/org/zstack/storage/volume/VolumeApiInterceptor.java

Comment on lines +2155 to +2175
Set<String> touchedGroupUuids = snapshots.stream()
.map(VolumeSnapshotInventory::getGroupUuid)
.filter(Objects::nonNull)
.collect(Collectors.toSet());

List<String> groupsToDelete = new ArrayList<>();
for (String groupUuid : touchedGroupUuids) {
long remaining = Q.New(VolumeSnapshotGroupRefVO.class)
.eq(VolumeSnapshotGroupRefVO_.volumeSnapshotGroupUuid, groupUuid)
.eq(VolumeSnapshotGroupRefVO_.snapshotDeleted, false).count();
if (remaining == 0) {
logger.debug(String.format("snapshot group[uuid:%s] all volume snapshots have been deleted, " +
"disbanding group", groupUuid));
groupsToDelete.add(groupUuid);
}
}

groupUuids.forEach(groupUuid -> vidm.deleteArchiveVmInstanceResourceMetadataGroup(groupUuid));
cleanVmHostBackupFilesForGroup(groupUuids);
dbf.removeByPrimaryKeys(groupUuids, VolumeSnapshotGroupVO.class);
if (!groupsToDelete.isEmpty()) {
groupsToDelete.forEach(groupUuid -> vidm.deleteArchiveVmInstanceResourceMetadataGroup(groupUuid));
cleanVmHostBackupFilesForGroup(groupsToDelete);
dbf.removeByPrimaryKeys(groupsToDelete, VolumeSnapshotGroupVO.class);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

这里会把同组的 Memory ref 永久算成“未删”,导致快照组无法解散。

Line 2162 开始的 remaining 统计会把整组里所有 snapshotDeleted=false 的 ref 都算进去,但这里前面只标记了 snapshots 本身的 ref。链路删除 root/data 快照时,同组的 memory snapshot 会在后面的 cleanUpMemorySnapshots() 里被直接硬删 VO/tree,却不会把对应的 VolumeSnapshotGroupRefVO.snapshotDeleted 置为 true。这样只要组里带有 memory ref,这里就会一直得到 remaining > 0,最终把该 VM 长期留在“不完整快照组”状态。建议把本次一并删除的 memory snapshot ref 也纳入标记,或者让 cleanUpMemorySnapshots() 复用同样的 group-ref 清理逻辑。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotTreeBase.java`
around lines 2155 - 2175, The code that computes remaining refs for a group uses
VolumeSnapshotGroupRefVO.snapshotDeleted to decide whether a group can be
disbanded, but memory snapshot VOs deleted by cleanUpMemorySnapshots() are
removed without setting their VolumeSnapshotGroupRefVO.snapshotDeleted=true, so
remaining will remain >0 and groups never disband; fix by ensuring memory
snapshots removed in cleanUpMemorySnapshots() also mark their corresponding
VolumeSnapshotGroupRefVO.snapshotDeleted=true (or add their group ref UUIDs to
the same snapshots-to-delete set before the remaining-count check), i.e. update
cleanUpMemorySnapshots() or the deletion flow to reuse the group-ref clearing
logic used here so VolumeSnapshotGroupRefVO entries reflect the deletion and
allow groupsToDelete to be computed correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants